Automated trailer generation

ABSTRACT

Disclosed herein are system, apparatus, article of manufacture, method and/or computer program product aspects, and/or combinations and sub-combinations thereof, for generating trailers (previews) for multimedia content. An example aspect operates by generating an initial set of candidate points to generate a trailer for a media content; determining conversion data for each of the initial set of candidate points; determining an updated set of candidate points based on the conversion data; determining an estimated mean and upper bound for each of the updated set of candidate points; computing a value for each of the updated set of candidate points; generating a ranked list based on the value computed for each of the updated set of candidate points; and repeating the process until an optimal candidate point is converged upon.

BACKGROUND Field

This disclosure is generally directed to multimedia systems, and moreparticularly to generating previews (e.g., trailers) for multimediacontent.

Background

Streaming services for multimedia content (e.g., movies, televisionseries, etc.) typically offer users previews of the multimedia content.These previews are often in the form of trailers. The purpose of thesetrailers is to boost user engagement with the multimedia content so thatthe user will consume the content (e.g., play, stream, download, etc.,the movies and shows).

Conventionally, previews (or trailers) are generated manually. Forexample, one or more clips from the multimedia content are manuallychosen, and those clips are manually edited to form a 30 second trailer.Often these clips are chosen at random to fit the genre of the movie andguesses/assumptions are made as to what clips will best engage users.

This conventional trailer generation approach is deficient for at leasttwo reasons. First, it does not scale properly. When there is a lot ofmultimedia content to make trailers for (e.g., hundreds or thousands ofmovie and/or show titles), manually generating trailers is both costlyand time consuming. Second, determining what clips will make for thebest trailers (e.g., for user engagement) is an inexact science (e.g.,the process is often subjective). Often guesses are made as to whatclips will make for the best trailers. These guesses may or may not leadto high user engagement. Thus, such approaches are not optimized togenerate the best trailers and/or for user engagement. Moreover, suchapproaches are only capable of being produced by human operators.

SUMMARY

Provided herein are system, apparatus, article of manufacture, method,and/or computer program product aspects, and/or combinations andsub-combinations thereof, for automated preview (e.g., trailer)generation for multimedia content. The multimedia content refers toshows, movies, series, documentaries, etc. of a multimedia service. Thesystem, apparatus, article of manufacture, method, and/or computerprogram product aspects is designed to solve the technological problemsoutlined in the Background section. Namely, (1) scaling trailergeneration, (2) optimizing for the best clips to use as trailers tobetter engage users, and (3) allowing computers to produce relevant anduser engaging trailers that previously could only be produced by humanoperators.

Aspects operate by first generating an initial set of candidate points.The initial set of candidate points refer to instances of a multimediacontent that can be used as starting points to generate a trailer. Theinitial set of candidate points can be sampled from any point of themultimedia content. In aspects, a predetermined number of initial set ofcandidate points can be chosen. For example, in aspects, this can befifty points. A variety of methods can be used to generate the initialset of candidate points. For example, the initial set of candidatepoints can be chosen at random. In other aspects, computer implementedmodels can be used to generate the initial set of candidate points. Forexample, content based models (such as deep video understanding models),interaction based models (such as those using clip interaction data), orcommercially available artificial intelligence (AI) based models such asthose provided by Vionlabs AB of Stockholm, Sweden can be used. Thesemodels seek to obtain mood, interaction, and/or features of scenes forthe multimedia content, so that candidate points are chosen to bestrepresent the multimedia content.

In aspects, once the initial set of candidate points are generated,conversion data for each of the initial set of candidate points can bedetermined. Conversion data refers to how often users engage/consume themultimedia content when each of the initial set of candidate points isset as the beginning of a trailer. Thus, in aspects, for each of theinitial set of candidate points, a 30 second clip can be generated as atrailer with that initial candidate point being the start of the 30second clip. That clip can be shown to users of the multimedia system,and engagement statistics determined. In aspects, the initial set ofcandidate points can be input into a computer implemented model toobtain the conversion data. In aspects, the model can be a Multi-ArmedBandit (MAB) model. MAB models are known to persons of skill in the artand will not be discussed in detail. For the purposes of discussion withrespect to this disclosure, it is assumed that the MAB model outputs theconversion data.

In aspects, once the conversion data is obtained, an updated set ofcandidate points can be sampled based on which of the initial set ofcandidate points is determined by the MAB model to have the highestconversion data. For example, the sampling can include choosing theupdated set of candidate points close to the initial set of candidatepoints with the highest rates of conversion. Alternatively, in anotheraspect, the update set of candidate points can be sampled randomlywithout considering the conversion data.

In aspects, once the updated set of candidate points are obtained, anestimated mean and an upper bound for each of the updated set ofcandidate points can be determined. The purpose of determining theestimated mean and the upper bound for each of the updated set ofcandidate points is two fold. First, the purpose is to exploit knowledgeof what updated set of candidate points best engage the users, andsecond is to use this knowledge to explore further candidate pointsaround each of the updated set of candidate points that might betterengage the users. Thus, the purpose of performing these computations isto optimize for the best point to use as the starting point of thetrailer.

In aspects, a value for each of the updated set of candidate points canbe computed. In aspects, the value can be computed based on adding theestimated mean and the upper bound for each of the updated set ofcandidate points. In aspects, a ranked list can be generated based onthe value computed for each of the updated set of candidate points. Oncecompleted, this process can be repeated for a new set of sampled pointsused as the initial set of candidate points. In aspects, this processcan be repeated until a termination condition is reached. Thetermination condition can result in an output indicating an optimalpoint for generating the trailer. The termination condition can be apredetermined set of iterations or can be a detection that no furtherimprovement in the conversion values are obtained for the candidatepoints input into the MAB model.

In aspects, further customization can be done such that if an optimalpoint is obtained as a result of the processes described above, it canbe further analyzed to determine whether it should or should not be usedas the start of the trailer. For example, if the output of the processesdescribed above results in a point in which the trailer results innudity, explicit content, or a turning point in the multimedia contentthat if shown would result in the plot being spoiled, the point can befiltered so as to not use that optimal point. The filtering can be basedon cateogorizations indicating the output is a scene that should not beshown. Thus, the next available optimal candidate point can be usedinstead.

BRIEF DESCRIPTION OF THE FIGURES

The accompanying drawings are incorporated herein and form a part of thespecification.

FIG. 1 illustrates a block diagram of a multimedia environment,according to some aspects.

FIG. 2 illustrates a block diagram of a streaming media device,according to some aspects.

FIG. 3 illustrates is a flowchart illustrating a process for automatedpreview (or trailer) generation for multimedia content, according tosome aspects.

FIG. 4 illustrates an example computer system useful for implementingvarious aspects.

In the drawings, like reference numbers generally indicate identical orsimilar elements. Additionally, generally, the left-most digit(s) of areference number identifies the drawing in which the reference numberfirst appears.

DETAILED DESCRIPTION

Provided herein are system, apparatus, device, method and/or computerprogram product aspects, and/or combinations and sub-combinationsthereof, for automated trailer (or preview) generation for multimediacontent. Various aspects of this disclosure may be implemented usingand/or may be part of a multimedia environment 102 shown in FIG. 1 . Itis noted, however, that multimedia environment 102 is provided solelyfor illustrative purposes, and is not limiting. Aspects of thisdisclosure may be implemented using and/or may be part of environmentsdifferent from and/or in addition to the multimedia environment 102, aswill be appreciated by persons skilled in the relevant art(s) based onthe teachings contained herein. An example of the multimedia environment102 shall now be described.

Multimedia Environment

FIG. 1 illustrates a block diagram of a multimedia environment 102,according to some aspects. In a non-limiting example, multimediaenvironment 102 may be directed to streaming media. However, thisdisclosure is applicable to any type of media (instead of or in additionto streaming media), as well as any mechanism, means, protocol, methodand/or process for distributing media.

The multimedia environment 102 may include one or more media systems104. A media system 104 could represent a family room, a kitchen, abackyard, a home theater, a school classroom, a library, a car, a boat,a bus, a plane, a movie theater, a stadium, an auditorium, a park, abar, a restaurant, or any other location or space where it is desired toreceive and play streaming content. User(s) 132 may operate with themedia system 104 to select and consume content.

Each media system 104 may include one or more media devices 106 eachcoupled to one or more display devices 108. It is noted that terms suchas “coupled,” “connected to,” “attached,” “linked,” “combined” andsimilar terms may refer to physical, electrical, magnetic, logical,etc., connections, unless otherwise specified herein.

Media device 106 may be a streaming media device, DVD or BLU-RAY device,audio/video playback device, cable box, and/or digital video recordingdevice, to name just a few examples. Display device 108 may be amonitor, television (TV), computer, smart phone, tablet, wearable (suchas a watch or glasses), appliance, internet of things (IoT) device,and/or projector, to name just a few examples. In some aspects, mediadevice 106 can be a part of, integrated with, operatively coupled to,and/or connected to its respective display device 108.

Each media device 106 may be configured to communicate with network 118via a communication device 114. The communication device 114 mayinclude, for example, a cable modem or satellite TV transceiver. Themedia device 106 may communicate with the communication device 114 overa link 116, wherein the link 116 may include wireless (such as WiFi)and/or wired connections.

In various aspects, the network 118 can include, without limitation,wired and/or wireless intranet, extranet, Internet, cellular, Bluetooth,infrared, and/or any other short range, long range, local, regional,global communications mechanism, means, approach, protocol and/ornetwork, as well as any combination(s) thereof.

Media system 104 may include a remote control 110. The remote control110 can be any component, part, apparatus and/or method for controllingthe media device 106 and/or display device 108, such as a remotecontrol, a tablet, laptop computer, smartphone, wearable, on-screencontrols, integrated control buttons, audio controls, or any combinationthereof, to name just a few examples. In an aspect, the remote control110 wirelessly communicates with the media device 106 and/or displaydevice 108 using cellular, Bluetooth, infrared, etc., or any combinationthereof. The remote control 110 may include a microphone 112, which isfurther described below.

The multimedia environment 102 may include a plurality of contentservers 120 (also called content providers, channels or sources 120).Although only one content server 120 is shown in FIG. 1 , in practicethe multimedia environment 102 may include any number of content servers120. Each content server 120 may be configured to communicate withnetwork 118.

Each content server 120 may store content 122 and metadata 124. Content122 may include any combination of music, videos, movies, TV programs,multimedia, images, still pictures, text, graphics, gaming applications,advertisements, programming content, public service content, governmentcontent, local community content, software, and/or any other content ordata objects in electronic form.

In some aspects, metadata 124 comprises data about content 122. Forexample, metadata 124 may include associated or ancillary informationindicating or related to writer, director, producer, composer, artist,actor, summary, chapters, production, history, year, trailers, alternateversions, related content, applications, and/or any other informationpertaining or relating to the content 122. Metadata 124 may also oralternatively include links to any such information pertaining orrelating to the content 122. Metadata 124 may also or alternativelyinclude one or more indexes of content 122, such as but not limited to atrick mode index.

The multimedia environment 102 may include one or more system servers126. The system servers 126 may operate to support the media devices 106from the cloud. It is noted that the structural and functional aspectsof the system servers 126 may wholly or partially exist in the same ordifferent ones of the system servers 126.

The media devices 106 may exist in thousands or millions of mediasystems 104. Accordingly, the media devices 106 may lend themselves tocrowdsourcing aspects and, thus, the system servers 126 may include oneor more crowdsource servers 128.

For example, using information received from the media devices 106 inthe thousands and millions of media systems 104, the crowdsourceserver(s) 128 may identify similarities and overlaps between closedcaptioning requests issued by different users 132 watching a particularmovie. Based on such information, the crowdsource server(s) 128 maydetermine that turning closed captioning on may enhance users' viewingexperience at particular portions of the movie (for example, when thesoundtrack of the movie is difficult to hear), and turning closedcaptioning off may enhance users' viewing experience at other portionsof the movie (for example, when displaying closed captioning obstructscritical visual aspects of the movie). Accordingly, the crowdsourceserver(s) 128 may operate to cause closed captioning to be automaticallyturned on and/or off during future streamings of the movie.

The system servers 126 may also include an audio command processingmodule 130. As noted above, the remote control 110 may include amicrophone 112. The microphone 112 may receive audio data from users 132(as well as other sources, such as the display device 108). In someaspects, the media device 106 may be audio responsive, and the audiodata may represent verbal commands from the user 132 to control themedia device 106 as well as other components in the media system 104,such as the display device 108.

In some aspects, the audio data received by the microphone 112 in theremote control 110 is transferred to the media device 106, which is thenforwarded to the audio command processing module 130 in the systemservers 126. The audio command processing module 130 may operate toprocess and analyze the received audio data to recognize the user 132'sverbal command. The audio command processing module 130 may then forwardthe verbal command back to the media device 106 for processing.

In some aspects, the audio data may be alternatively or additionallyprocessed and analyzed by an audio command processing module 216 in themedia device 106 (see FIG. 2 ). The media device 106 and the systemservers 126 may then cooperate to pick one of the verbal commands toprocess (either the verbal command recognized by the audio commandprocessing module 130 in the system servers 126, or the verbal commandrecognized by the audio command processing module 216 in the mediadevice 106).

FIG. 2 illustrates a block diagram of an example media device 106,according to some aspects. Media device 106 may include a streamingmodule 202, processing module 204, storage/buffers 208, and userinterface module 206. As described above, the user interface module 206may include the audio command processing module 216.

The media device 106 may also include one or more audio decoders 212 andone or more video decoders 214.

Each audio decoder 212 may be configured to decode audio of one or moreaudio formats, such as but not limited to AAC, HE-AAC, AC3 (DolbyDigital), EAC3 (Dolby Digital Plus), WMA, WAV, PCM, MP3, OGG GSM, FLAC,AU, AIFF, and/or VOX, to name just some examples.

Similarly, each video decoder 214 may be configured to decode video ofone or more video formats, such as but not limited to MP4 (mp4, m4a,m4v, f4v, f4a, m4b, m4r, f4b, mov), 3GP (3gp, 3gp2, 3g2, 3gpp, 3gpp2),OGG (ogg, oga, ogv, ogx), WMV (wmy, wma, asf), WEBM, FLV, AVI,QuickTime, HDV, MXF (OPla, OP-Atom), MPEG-TS, MPEG-2 PS, MPEG-2 TS, WAV,Broadcast WAV, LXF, GXF, and/or VOB, to name just some examples. Eachvideo decoder 214 may include one or more video codecs, such as but notlimited to H.263, H.264, HEV, MPEG1, MPEG2, MPEG-TS, MPEG-4, Theora,3GP, DV, DVCPRO, DVCPRO, DVCProHD, IMX, XDCAM HD, XDCAM HD422, and/orXDCAM EX, to name just some examples.

Now referring to both FIGS. 1 and 2 , in some aspects, the user 132 mayinteract with the media device 106 via, for example, the remote control110. For example, the user 132 may use the remote control 110 tointeract with the user interface module 206 of the media device 106 toselect content, such as a movie, TV show, music, book, application,game, etc. The streaming module 202 of the media device 106 may requestthe selected content from the content server(s) 120 over the network118. The content server(s) 120 may transmit the requested content to thestreaming module 202. The media device 106 may transmit the receivedcontent to the display device 108 for playback to the user 132.

In streaming aspects, the streaming module 202 may transmit the contentto the display device 108 in real time or near real time as it receivessuch content from the content server(s) 120. In non-streaming aspects,the media device 106 may store the content received from contentserver(s) 120 in storage/buffers 208 for later playback on displaydevice 108.

Automated Trailer Generation

Referring to FIG. 1 , the content servers 120, the system servers 126,or a combination thereof can be used for the automated trailergeneration of the present disclosure. The trailer (or preview)generation can be for a piece of content 122. The disclosed system,method, and instructions stored on a non-transitory computer readablemedia for automated trailer generation is designed to solve thetechnological problems outlined in the Background section. Namely, (1)scaling trailer generation, (2) optimizing for the best clips to use astrailers to better engage users for multimedia content, and (3) allowingcomputers to produce relevant and user engaging trailers that previouslycould only be produced by human operators.

The improvements stem from the use of machine learning and/or AI modelsthat provide a novel way of generating trailers for content 122 thatdoes not rely on human intervention. The improvements also stem fromusing computer implemented models/rules that use acquired knowledge ofwhat instances of time in the content 122 invoke user engagement. Inaspects, that knowledge can be acquired using the crowdsource servers128. That knowledge is then used to determine what other surroundinginstances of time may lead to more optimal engagement by users. As aresult, the system, method, and instructions provide an optimizationthat converges on an optimal point in the content 122, based on which atrailer can be generated that is most likely to lead to user engagementwith the content 122. The use of these models are a very cost effectiveway of trailer generation (especially when scaled) due to the lack ofneed for human intervention.

Additionally, the disclosed system, method, and instructions, improvesthe state of the art from conventional systems because it allows fortrailer generation to be scaled. Current methods of trailer generationrequire human intervention, editing, and subjective judgment. Thedisclosed system, method, and instructions, however, can be run onhundreds or thousands of pieces of content 122 to generate trailers,without the need of further human intervention to generate, edit, orproduce the trailers. This results in an efficient use of computing andmanpower.

An example aspect operates by generating an initial set of candidatepoints to generate a trailer for the content 122. Conversion data can bedetermined for each of the initial set of candidate points. An updatedset of candidate points is then sampled either based on the conversiondata or randomly chosen without considering the conversion data. Inaspects, an estimated mean and upper bound is determined for each of theupdated set of candidate points. A value is computed for each of theupdated set of candidate points. A ranked list is generated based on thevalue computed for each of the updated set of candidate points. Theprocess is repeated until an optimal candidate point is converged upon.

FIG. 3 illustrates is a flowchart illustrating a process 300 forautomated preview (or trailer) generation for multimedia content,according to some aspects. Process 300 can be performed by processinglogic that can comprise hardware (e.g., circuitry, dedicated logic,programmable logic, microcode, etc.), software (e.g., instructionsexecuting on a processing device), or a combination thereof. Thesoftware can consist of modules installed on the content servers 120,the system servers 126, or a combination thereof. The modules canperform the functions described below with respect to process 300. It isto be appreciated that not all steps may be needed to perform thedisclosure provided herein. Further, some of the steps may be performedsimultaneously, or in a different order than shown in FIG. 3 , as willbe understood by a person of ordinary skill in the art

As previously indicated, the purpose of process 300 can be to automatethe preview (or trailer) generation process for multimedia content. Themultimedia content can be the same as the content 122 described withrespect to FIG. 1 .

In aspects, process 300 can be performed using the content servers 120,the system servers 126, or a combination thereof. Thus, the contentservers 120, the system servers 126, or a combination thereof canexecute instructions (e.g., software) using components such asprocessors, memory devices, etc., to perform the processes necessary togenerate the trailers for the multimedia content.

In aspects, process 300 can be implemented as instructions stored on anon-transitory computer readable medium of the content servers 120, thesystem servers 126, or a combination thereof. In aspects, theinstructions can be processed by one or more processors to perform thefunctions described with respect to process 300. The non-transitorycomputer readable medium can be one or more memory devices.

In step 302, in order to generate a trailer for a piece of content 122,process 300 can begin by generating an initial set of candidate pointsfor the content 122 to be used as starting points for a trailer. Theinitial set of candidate points can be from any time instance of thecontent 122. For the purposes of this disclosure, the initial set ofcandidate points will also be referred to as Q points. Each of theinitial set of candidate points will be referred to as points {Q1, Q2, .. . , Qk}, where k is an integer value representing the maximum numberof Q points.

In aspects, a variety of methods can be used to generate the initial setof candidate points. For example, the initial set of candidate pointscan be chosen at random. In other aspects, computer implemented modelscan be used to generate the initial set of candidate points. Forexample, content based models (such as deep video understanding models),interaction based models (such as those using clip interaction data), orcommercially available models such as those provided by Vionlabs AB ofStockholm, Sweden can be used. A person of ordinary skill in the artwill be familiar with such models and their details will not bediscussed in detail. For the purposes of this disclosure, it is assumedthat such models can be used to generate the initial set of candidatepoints.

The aforementioned models may seek to obtain mood, interaction, and/orfeatures of scenes of the content 122, so that the initial set ofcandidate points are chosen based on the genre of the content 122 tobetter represent what the content 122 relates to. The assumption is thatmatching the clips to the genre of the content 122 will best engageusers. This, however, does not necessarily have to be the case, andcandidate points can be chosen that do not represent the genre of thecontent 122. As an example, a war movie can have candidate pointsselected from scenes depicting war. However, in aspects, a war movie canalso have candidate points selected from scenes depicting peacefuldialog. How the candidate points are chosen can be customized.

In step 304, once the initial set of candidate points are generated,conversion data for each of the initial set of candidate points can bedetermined. Conversion data refers to how often users consume/engagewith the content 122 when each of the initial set of candidate points isset as the starting point of a trailer. As an example, in aspects, foreach of the initial set of candidate points, a 30 second clip can begenerated as a trailer with that initial candidate point being the startof the 30 second clip. That clip can be shown to users of the multimediaenvironment 102, and engagement statistics can be collected. In aspects,the initial set of candidate points can be input into a computerimplemented model to obtain the conversion data. In aspects, this modelcan be a Multi-Armed Bandit (MAB) model. MAB models are known to personsof skill in the art and will not be discussed in detail. For thepurposes of discussion and with respect to this disclosure it is assumedthat the MAB model will output the conversion data. For the purposes ofthis disclosure the conversion data for each of the initial set ofcandidate points will also be referenced as {Cov(Q1), Cov(Q2), . . . ,Cov(Qk)} throughout this disclosure, where k is an integer valuerepresenting the maximum number of Q points sampled.

In step 306, once the conversion data is obtained, an updated set ofcandidate points can be sampled based on the conversion data. Thesampling can include choosing from amongst the initial set of candidatepoints the ones with the highest rates of conversion (i.e., the onesindicating the highest user engagement) and sampling points around thosepoints as the updated set of candidate points. What is determined to bea high rate of conversion can be set by a designer of the system. Inalternative embodiments, the updated set of candidate points can bechosen at random without considering the conversion data. For thepurposes of this disclosure the updated set of candidate points will bereferenced as {D1, D2, . . . , Dm}, where m is an integer representingthe maximum number of updated set of candidate points D.

The sampling of the updated set of candidate points can be done in avariety of ways. In aspects, the sampling can be done using apredetermined threshold. For example, the predetermined threshold can bea percentage value, which can serve as a cutoff point. Thus, the initialset of candidate points that have conversion rates above thatpredetermined threshold can be chosen as the updated set of candidatepoints. For example, the predetermined threshold can be a conversionrate of X percent, where X is a positive number. If the conversion ratefor any initial set of candidate point is above X percent, it can bechosen to be part of the updated set of candidate points. In aspects,the sampling can further include choosing points surrounding the initialset of candidate points that have high conversion rates. Thus, randomselections can be made of points if they fall within a certain timerange relative to the initial set of candidate points with highconversion rates. This can be, for example, ±X seconds of an initialcandidate point with a high conversion rate. The aforementioned ismerely exemplary, and other techniques can be used as recognized by aperson of ordinary skill in the art reading this disclosure.

The above sampling methods can provide advantages of two otherapproaches to sampling. These are the so called grid sampling approachand the other is a local search approach. The grid sampling approachinvolves continually randomly sampling the content 122 to find othercandidate points different from the initial set of candidate points anddetermining what their conversion rates are and then choosing the oneswith the highest conversion rates. This can be done continuously untilall the potential starting points in the content 122 are exhausted. Thisapproach, however, is undesirable because it is computationallyexpensive and takes a long time to randomly sample all the potentialstarting points of the content 122.

The local search approach involves using the initial set of candidatepoints and adding or subtracting some time from each point to see if anew point results in better engagement. This approach, however, is notdesirable because it doesn't use the conversion data knowledge to selectupdated candidate points. Thus, it only iterates through points withoutleveraging existing knowledge about the best converting initial set ofcandidate points, and is therefore less desirable than the approachoutline in this disclosure. Additionally, it can result in situationswhere a seemingly optimal point is found, but that point is not a trueoptimal point because of the constraints introduced by adding orsubtracting time form each point, which may result in the system notsearching a full range of points.

In steps 308, once the updated set of candidate points are obtained, anestimated mean for each of the updated set of candidate points can bedetermined.

In step 310, once the updated set of candidate points are obtained, anupper bound for each of the updated set of candidate points can also bedetermined.

The purpose of determining the estimated mean and the upper bound foreach of the updated set of candidate points is two fold. First, it is toexploit knowledge of which updated set of candidate points best engagethe users, and second is to use this knowledge to explore furthercandidate points around each of the updated set of candidate points thatmight better engage the users. Thus, the purpose of performing thesecomputations is to optimize and converge on the best point to use as thestart of the trailer. Since it is already known that certain points havehigher user engagement rates as indicated by the conversion data, it isdesirable to determine if any further points around those points willyield better results for user engagement.

In aspects, in order to determine an estimated mean for each of theupdated set of candidate points, and assuming that the updated set ofcandidate points is among Qi and Qj, where i and j are integer values,and where the value of either i or j cannot be greater than k, which isthe maximum number of the initial set of candidate points Q, equation(1), shown below, can be used.E(Cov(Dt))=(Cov(Qi)*I/Distance(Dt,Qi)+Cov(Qj)*I/Distance(Dt,Qj))/(I/Distance(Dt,Qi)+I/Distance(Dt,Qj))  (1)

In equation (1), E(Cov(Dt)) is the estimated mean value for an updatedcandidate point; Cov(Qi) is the conversion rate for an initial candidatepoint; Distance(Dt,Qi) is the distance (in time) between the start pointof the updated candidate point and the initial candidate point; Cov(Qj)is the conversion rate for a further initial candidate point;Distance(Dt,Qj)) is the distance (in time) between the start points ofthe updated candidate point and the further initial candidate point; andi and j are integer values, where the value of either i or j cannot begreater than k, which is the maximum number of the initial set ofcandidate points Q.

In aspects, in order to determine the upper bound for each of theupdated set of candidate points equation (2), shown below, can be used.UpperBound(Dt)=alpha*sqrt((Min_distance(Dt,Qi)/n))+beta*sqrt(log(n)/N(Dt))  (2)

In equation (2), UpperBound(Dt) is the upper bound value of an updatedcandidate point; alpha is a constant; beta is another constant;Min_distance(Dt, Qi) is the minimum distance (in time) between the startpoints of the updated candidate point and an initial candidate point; nis a number of iterations to perform the computation for; and N(Dt) isthe number of times a content 122 is shown starting from the updatedcandidate point Dt.

In step 312, a value for each of the updated set of candidate points canbe computed. In aspects, the value can be computed based on adding theestimated mean and the upper bound for each of the updated set ofcandidate points obtained using equations (1) and (2).

In step 314, a ranked list can be generated based on the value computedfor each of the updated set of candidate points. The ranked listrepresents an ordered list of the updated set of candidate points thatcan represent the best points based on the processes performed above.

In step 316, this process can be repeated for a new set of sampledpoints used as the initial set of candidate points. In aspects, thisprocess can be repeated until a termination condition is reached. Thetermination condition can result in an output indicating an optimalpoint for generating the trailer. The termination condition can be apredetermined set of iterations or can be a detection that no furtherimprovement in the conversion values are obtained for the candidatepoints input into the MAB model.

In aspects, further customization can be done such that if a candidatepoint is obtained as a result of the processes described above, it canbe further analyzed to determine whether it is a point that results in atrailer that should not be shown. For example, if the optimized outputresults in a candidate point that if used as the starting point for atrailer, results in a scene that shows nudity, explicit content, or is aturning point in the content 122 and showing scene would result in theplot being spoiled, the candidate point can be filtered so as to not usethat as a candidate point. Thus, the next optimal candidate point can beused instead. In aspects, the filtering can be done usingcategorizations for what is nudity, explicit content, etc. These can bebased on machine learning models that can categorize scenes. Thesemodels can be, for example, the content based models (such as deep videounderstanding models) previously described.

The process 300 described in FIG. 3 may be implemented as instructionsstored on a non-transitory computer readable medium to be executed byone or more computing units such as a processor, a special purposecomputer, an integrated circuit, integrated circuit cores, or acombination thereof. The non-transitory computer readable medium may beimplemented with any number of memory units, such as a volatile memory,a nonvolatile memory, an internal memory, an external memory, or acombination thereof. The non-transitory computer readable medium may beintegrated as a part of any of the servers or devices of the system, orinstalled as a removable portion of the servers or devices of thesystem.

Architecture of Computer Systems Implementing Process 300

FIG. 4 illustrates an example computer system 400 useful forimplementing various aspects. In aspects, the computer system 400 may bethe components of the servers (e.g., content servers 120 or systemservers 126) that are used to implement the functions of the process300. In aspects, the computer system 400 may include a control unit 402,a storage unit 406, a communication unit 416, and a user interface 412.The control unit 402 may include a control interface 404. The controlunit 402 may execute a software 410 to provide some or all of theintelligence of computer system 400. The control unit 402 may beimplemented in a number of different ways. For example, the control unit402 may be a processor, an application specific integrated circuit(ASIC), an embedded processor, a microprocessor, a hardware controllogic, a hardware finite state machine (FSM), a digital signal processor(DSP), a field programmable gate array (FPGA), a graphics processingunit (GPU), or a combination thereof.

The control interface 404 may be used for communication between thecontrol unit 402 and other functional units or devices of computersystem 400. The control interface 404 may also be used for communicationthat is external to the functional units or devices of computer system400. The control interface 404 may receive information from thefunctional units or devices of computer system 400, or from remotedevices 420 such as databases used in conjunction with the computersystem 400, or may transmit information to the functional units ordevices of computer system 400, or to remote devices 420. The remotedevices 420 refer to units or devices external to computer system 400.

The control interface 404 may be implemented in different ways and mayinclude different implementations depending on which functional units ordevices of computer system 400 or remote devices 420 are beinginterfaced with the control unit 402. For example, the control interface404 may be implemented with optical circuitry, waveguides, wirelesscircuitry, wireline circuitry to attach to a bus, an applicationprogramming interface, or a combination thereof. The control interface404 may be connected to a communication infrastructure 422, such as abus, to interface with the functional units or devices of computersystem 400 or remote devices 420.

The storage unit 406 may store the software 410. For illustrativepurposes, the storage unit 406 is shown as a single element, although itis understood that the storage unit 406 may be a distribution of storageelements. Also for illustrative purposes, the storage unit 406 is shownas a single hierarchy storage system, although it is understood that thestorage unit 406 may be in a different configuration. For example, thestorage unit 406 may be formed with different storage technologiesforming a memory hierarchical system including different levels ofcaching, main memory, rotating media, or off-line storage. The storageunit 406 may be a volatile memory, a nonvolatile memory, an internalmemory, an external memory, or a combination thereof. For example, thestorage unit 406 may be a nonvolatile storage such as nonvolatile randomaccess memory (NVRAM), Flash memory, disk storage, or a volatile storagesuch as static random access memory (SRAM) or dynamic random accessmemory (DRAM).

The storage unit 406 may include a storage interface 408. The storageinterface 408 may be used for communication between the storage unit 406and other functional units or devices of computer system 400. Thestorage interface 408 may also be used for communication that isexternal to computer system 400. The storage interface 408 may receiveinformation from the other functional units or devices of computersystem 400 or from remote devices 420, or may transmit information tothe other functional units or devices of computer system 400 or toremote devices 420. The storage interface 408 may include differentimplementations depending on which functional units or devices ofcomputer system 400 or remote devices 420 are being interfaced with thestorage unit 406. The storage interface 408 may be implemented withtechnologies and techniques similar to the implementation of the controlinterface 404.

The communication unit 416 may enable communication to devices,components, modules, or units of computer system 400 or to remotedevices 420. For example, the communication unit 416 may permit thecomputer system 400 to communicate between its components such as thecontent servers 120, the media system 104, and the system servers 126.The communication unit 416 may further permit the devices of computersystem 400 to communicate with remote devices 420 such as an attachment,a peripheral device, or a combination thereof through network 118.

As previously indicated with respect to FIG. 1 , the network 118 mayspan and represent a variety of networks and network topologies. Forexample, the network 118 may be a part of a network and include wirelesscommunication, wired communication, optical communication, ultrasoniccommunication, or a combination thereof. For example, satellitecommunication, cellular communication, Bluetooth, Infrared DataAssociation standard (IrDA), wireless fidelity (WiFi), and worldwideinteroperability for microwave access (WiMAX) are examples of wirelesscommunication that may be included in the network 118. Cable, Ethernet,digital subscriber line (DSL), fiber optic lines, fiber to the home(FTTH), and plain old telephone service (POTS) are examples of wiredcommunication that may be included in the network 118. Further, thenetwork 118 may traverse a number of network topologies and distances.For example, the network 118 may include direct connection, personalarea network (PAN), local area network (LAN), metropolitan area network(MAN), wide area network (WAN), or a combination thereof.

The communication unit 416 may also function as a communication huballowing computer system 400 to function as part of the network 118 andnot be limited to be an end point or terminal unit to the network 118.The communication unit 416 may include active and passive components,such as microelectronics or an antenna, for interaction with the network118.

The communication unit 416 may include a communication interface 418.The communication interface 418 may be used for communication betweenthe communication unit 416 and other functional units or devices ofcomputer system 400 or to remote devices 420. The communicationinterface 418 may receive information from the other functional units ordevices of computer system 400, or from remote devices 420, or maytransmit information to the other functional units or devices of thecomputer system 400 or to remote devices 420. The communicationinterface 418 may include different implementations depending on whichfunctional units or devices are being interfaced with the communicationunit 416. The communication interface 418 may be implemented withtechnologies and techniques similar to the implementation of the controlinterface 404.

The user interface 412 may present information generated by computersystem 400. The user interface 412 may interact with input devices andan output device. Examples of the input device of the user interface 412may include a keypad, buttons, switches, touchpads, soft-keys, akeyboard, a mouse, or any combination thereof to provide data andcommunication inputs. Examples of the output device may include adisplay interface 414. The control unit 402 may operate the userinterface 412 to present information generated by computer system 400.The control unit 402 may also execute the software 410 to presentinformation generated by computer system 400, or to control otherfunctional units of computer system 400. The display interface 414 maybe any graphical user interface such as a display, a projector, a videoscreen, or any combination thereof.

The terms “module” or “unit” referred to in this disclosure can includesoftware, hardware, or a combination thereof in an aspect of the presentdisclosure in accordance with the context in which the term is used. Forexample, the software may be machine code, firmware, embedded code, orapplication software. Also for example, the hardware may be circuitry, aprocessor, a special purpose computer, an integrated circuit, integratedcircuit cores, passive devices, or a combination thereof. Further, if amodule or unit is written in the system or apparatus claims sectionbelow, the module or unit is deemed to include hardware circuitry forthe purposes and the scope of the system or apparatus claims.

The modules and units in the aforementioned description of the aspectsmay be coupled to one another as described or as shown. The coupling maybe direct or indirect, without or with intervening items between coupledmodules or units. The coupling may be by physical contact or bycommunication between modules or units.

CONCLUSION

The sections set forth one or more but not all exemplary aspects ascontemplated by the inventor(s), and thus, are not intended to limitthis disclosure or the appended claims in any way.

While this disclosure describes exemplary aspects for exemplary fieldsand applications, it should be understood that the disclosure is notlimited thereto. Other aspects and modifications thereto are possible,and are within the scope and spirit of this disclosure. For example, andwithout limiting the generality of this paragraph, aspects are notlimited to the software, hardware, firmware, and/or entities illustratedin the figures and/or described herein. Further, aspects (whether or notexplicitly described herein) have significant utility to fields andapplications beyond the examples described herein.

Aspects have been described herein with the aid of functional buildingblocks illustrating the implementation of specified functions andrelationships thereof. The boundaries of these functional buildingblocks have been arbitrarily defined herein for the convenience of thedescription. Alternate boundaries can be defined as long as thespecified functions and relationships (or equivalents thereof) areappropriately performed. Also, alternative aspects can performfunctional blocks, steps, operations, methods, etc. using orderingsdifferent than those described herein.

References herein to “one aspect,” “an aspect,” “an example aspect,” orsimilar phrases, indicate that the aspect described may include aparticular feature, structure, or characteristic, but every aspect maynot necessarily include the particular feature, structure, orcharacteristic. Moreover, such phrases are not necessarily referring tothe same aspect. Further, when a particular feature, structure, orcharacteristic is described in connection with an aspect, it would bewithin the knowledge of persons skilled in the relevant art(s) toincorporate such feature, structure, or characteristic into otheraspects whether or not explicitly mentioned or described herein.Additionally, some aspects can be described using the expression“coupled” and “connected” along with their derivatives. These terms arenot necessarily intended as synonyms for each other. For example, someaspects can be described using the terms “connected” and/or “coupled” toindicate that two or more elements are in direct physical or electricalcontact with each other. The term “coupled,” however, can also mean thattwo or more elements are not in direct contact with each other, but yetstill co-operate or interact with each other.

The breadth and scope of this disclosure should not be limited by any ofthe above-described exemplary aspects, but should be defined only inaccordance with the following claims and their equivalents.

What is claimed is:
 1. A computer implemented method for automatedtrailer generation, the method comprising: (a) generating, by one ormore computing devices, an initial set of candidate points to generate atrailer for a media content; (b) determining conversion data for each ofthe initial set of candidate points; (c) generating an updated set ofcandidate points based on the conversion data; (d) determining anestimated mean for each of the updated set of candidate points; (e)determining an upper bound for each of the updated set of candidatepoints; (f) computing a value for each of the updated set of candidatepoints, wherein the value is computed based on adding the estimated meanand the upper bound for each of the updated set of candidate points; (g)generating a ranked list based on the value computed for each of theupdated set of candidate points; and (h) performing (b)-(g), using a newinitial set of candidate points generated and input into a Multi-ArmedBandit (MAB) model until a termination condition is reached, wherein thetermination condition results in an output indicating an optimal pointfor generating the trailer.
 2. The method of claim 1, wherein generatingthe initial set of candidate points is based on utilizing a contentbased model or an interaction based model to choose the initial set ofcandidate points.
 3. The method of claim 1, wherein determining theconversion data is based on processing the initial set of candidatepoints using the MAB model to obtain the conversion data.
 4. The methodof claim 1, further comprising sampling the initial set of candidatepoints to obtain the updated set of candidate points based on samplingthe initial set of candidate points having conversion data above apredetermined threshold.
 5. The method of claim 1, wherein the estimatedmean for each of the updated set of candidate points is determined by:E(Cov(Dt))=(Cov(Qi)*I/Distance(Dt,Qi)+Cov(Qj)*I/Distance(Dt,Qj))/(I/Distance(Dt,Qi)+I/Distance(Dt,Qj)),wherein, E(Cov(Dt)) is an estimated mean value for an updated candidatepoint; Cov(Qi) is a conversion rate for an initial candidate point;Distance(Dt,Qi) is a distance in time between a start of the updatedcandidate point and the initial candidate point; Cov(Qj) is a conversionrate for a further initial candidate point; Distance(Dt,Qj) is adistance in time between a start of the updated candidate point and thefurther initial candidate point; and i and j are integer values, whereeither i or j cannot be greater than a maximum number of the initial setof candidate points Q.
 6. The method of claim 1, wherein the upper boundfor each of the updated set of candidate points is determined by:UpperBound(Dt)=alpha*sqrt((Min_distance(Dt,Qi)/n))+beta*sqrt(log(n)/N(Dt)),wherein, UpperBound(Dt) is an upper bound value of an updated candidatepoint; alpha is a constant; beta is a constant; Min_distance(Dt, Qi) isa minimum distance in time between the start of the updated candidatepoint and an initial candidate point; n is a number of iterations toperform the determination for; and N(Dt) is the number of times themedia content is shown starting from the updated candidate point.
 7. Themethod of claim 1, further comprising filtering the output based on acategorization indicating the output is a scene that should not beshown.
 8. A non-transitory computer readable medium includinginstructions for automated trailer generation that when performed by acomputing system, cause the computing system to perform operationscomprising: (a) generating, by one or more computing devices, an initialset of candidate points to generate a trailer for a media content; (b)determining conversion data for each of the initial set of candidatepoints; (c) generating an updated set of candidate points based on theconversion data; (d) determining an estimated mean for each of theupdated set of candidate points; (e) determining an upper bound for eachof the updated set of candidate points; (f) computing a value for eachof the updated set of candidate points, wherein the value is computedbased on adding the estimated mean and the upper bound for each of theupdated set of candidate points; (g) generating a ranked list based onthe value computed for each of the updated set of candidate points; and(h) performing (b)-(g) using a new initial set of candidate pointsgenerated and input into a Multi-Armed Bandit (MAB) model until atermination condition is reached, wherein the termination conditionresults in an output indicating an optimal point for generating thetrailer.
 9. The non-transitory computer readable medium of claim 8,wherein generating the initial set of candidate points is based onutilizing a content based model or an interaction based model to choosethe initial set of candidate points.
 10. The non-transitory computerreadable medium of claim 8, wherein determining the conversion data isbased on processing the initial set of candidate points using the MABmodel to obtain the conversion data.
 11. The non-transitory computerreadable medium of claim 8, wherein the operations further comprisesampling the initial set of candidate points to obtain the updated setof candidate points based on sampling the initial set of candidatepoints having conversion data above a predetermined threshold.
 12. Thenon-transitory computer readable medium of claim 8, wherein theestimated mean for each of the updated set of candidate points isdetermined by:E(Cov(Dt))=(Cov(Qi)*I/Distance(Dt,Qi)+Cov(Qj)*I/Distance(Dt,Qj))/(I/Distance(Dt,Qi)+I/Distance(Dt,Qj)),wherein, E(Cov(Dt)) is an estimated mean value for an updated candidatepoint; Cov(Qi) is a conversion rate for an initial candidate point;Distance(Dt,Qi) is a distance in time between a start of the updatedcandidate point and the initial candidate point; Cov(Qj) is a conversionrate for a further initial candidate point; Distance(Dt,Qj) is adistance in time between a start of the updated candidate point and thefurther initial candidate point; and i and j are integer values, whereeither i or j cannot be greater than a maximum number of the initial setof candidate points Q.
 13. The non-transitory computer readable mediumof claim 8, wherein the upper bound for each of the updated set ofcandidate points is determined by:UpperBound(Dt)=alpha*sqrt((Min_distance(Dt,Qi)/n))+beta*sqrt(log(n)/N(Dt)),wherein, UpperBound(Dt) is an upper bound value of an updated candidatepoint; alpha is a constant; beta is a constant; Min_distance(Dt, Qi) isa minimum distance in time between a start of the updated candidatepoint and an initial candidate point; in is a number of iterations toperform the determination for; and N(Dt) is the number of times themedia content is shown starting from the updated candidate point. 14.The non-transitory computer readable medium of claim 8, wherein theoperations further comprise filtering the output based on acategorization indicating the output is a scene that should not beshown.
 15. A computing system for automated trailer generationcomprising: a memory storing instructions; and one or more processors,coupled to the memory, configured to process the stored instructions to:(a) generate an initial set of candidate points to generate a trailerfor a media content; (b) determine conversion data for each of theinitial set of candidate points; (c) generate an updated set ofcandidate points based on the conversion data; (d) determine anestimated mean for each of the updated set of candidate points; (e)determine an upper bound for each of the updated set of candidatepoints; (f) compute a value for each of the updated set of candidatepoints, wherein the value is computed based on adding the estimated meanand the upper bound for each of the updated set of candidate points; (g)generate a ranked list based on the value computed for each of theupdated set of candidate points; (h) perform (b)-(g) using a new initialset of candidate points generated and input into a Multi-Armed Bandit(MAB) model until a termination condition is reached, wherein thetermination condition results in an output indicating an optimal pointfor generating the trailer; and wherein, generating the initial set ofcandidate points is based on utilizing a content based model or aninteraction based model to choose the initial set of candidate points.16. The computing system of claim 15, wherein determining the conversiondata is based on processing the initial set of candidate points usingthe MAB model to obtain the conversion data.
 17. The computing system ofclaim 15, wherein the one or more processors are further configured tosample the initial set of candidate points to obtain the updated set ofcandidate points based on sampling the initial set of candidate pointshaving conversion data above a predetermined threshold.
 18. Thecomputing system of claim 15, wherein the estimated mean for each of theupdated set of candidate points is determined by:E(Cov(Dt))=(Cov(Qi)*I/Distance(Dt,Qi)+Cov(Qj)*I/Distance(Dt,Qj))/(I/Distance(Dt,Qi)+I/Distance(Dt,Qj)),wherein, E(Cov(Dt)) is an estimated mean value for an updated candidatepoint; Cov(Qi) is a conversion rate for an initial candidate point;Distance(Dt,Qi) is a distance in time between a start of the updatedcandidate point and the initial candidate point; Cov(Qj) is a conversionrate for a further initial candidate point; Distance(Dt,Qj) is adistance in time between a start of the updated candidate point and thefurther initial candidate point; and i and j are integer values, whereeither i or j cannot be greater than a maximum number of the initial setof candidate points Q.
 19. The computing system of claim 15, wherein theupper bound for each of the updated set of candidate points isdetermined by:UpperBound(Dt)=alpha*sqrt((Min_distance(Dt,Qi)/n))+beta*sqrt(log(n)/N(Dt)), wherein, UpperBound(Dt) is an upperbound value of an updated candidate point; alpha is a constant; beta isa constant; Min_distance(Dt, Qi) is a minimum distance in time between astart of the updated candidate point and an initial candidate point; nis a number of iterations to perform the determination for; and N(Dt) isthe number of times the media content is shown starting from the updatedcandidate point.
 20. The computing system of claim 15, wherein the oneor more processors are further configured to filter the output based ona categorization indicating the output is a scene that should not beshown.